unified theory
A unified theory for the origin of grid cells through the lens of pattern formation
There are currently two seemingly unrelated frameworks for understanding these patterns. Mechanistic models account for hexagonal firing fields as the result of pattern-forming dynamics in a recurrent neural network with hand-tuned center-surround connectivity. Normative models specify a neural architecture, a learning rule, and a navigational task, and observe that grid-like firing fields emerge due to the constraints of solving this task. Here we provide an analytic theory that unifies the two perspectives by casting the learning dynamics of neural networks trained on navigational tasks as a pattern forming dynamical system. This theory provides insight into the optimal solutions of diverse formulations of the normative task, and shows that symmetries in the representation of space correctly predict the structure of learned firing fields in trained neural networks. Further, our theory proves that a nonnegativity constraint on firing rates induces a symmetry-breaking mechanism which favors hexagonal firing fields. We extend this theory to the case of learning multiple grid maps and demonstrate that optimal solutions consist of a hierarchy of maps with increasing length scales. These results unify previous accounts of grid cell firing and provide a novel framework for predicting the learned representations of recurrent neural networks.
A unified theory for the origin of grid cells through the lens of pattern formation
There are currently two seemingly unrelated frameworks for understanding these patterns. Mechanistic models account for hexagonal firing fields as the result of pattern-forming dynamics in a recurrent neural network with hand-tuned center-surround connectivity. Normative models specify a neural architecture, a learning rule, and a navigational task, and observe that grid-like firing fields emerge due to the constraints of solving this task. Here we provide an analytic theory that unifies the two perspectives by casting the learning dynamics of neural networks trained on navigational tasks as a pattern forming dynamical system. This theory provides insight into the optimal solutions of diverse formulations of the normative task, and shows that symmetries in the representation of space correctly predict the structure of learned firing fields in trained neural networks.
Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors
With the launch of ChatGPT, large language models (LLMs) have attracted global attention. In the realm of article writing, LLMs have witnessed extensive utilization, giving rise to concerns related to intellectual property protection, personal privacy, and academic integrity. In response, AI-text detection has emerged to distinguish between human and machine-generated content. However, recent research indicates that these detection systems often lack robustness and struggle to effectively differentiate perturbed texts. Currently, there is a lack of systematic evaluations regarding detection performance in real-world applications, and a comprehensive examination of perturbation techniques and detector robustness is also absent. To bridge this gap, our work simulates real-world scenarios in both informal and professional writing, exploring the out-of-the-box performance of current detectors. Additionally, we have constructed 12 black-box text perturbation methods to assess the robustness of current detection models across various perturbation granularities. Furthermore, through adversarial learning experiments, we investigate the impact of perturbation data augmentation on the robustness of AI-text detectors. We have released our code and data at https://github.com/zhouying20/ai-text-detector-evaluation.
Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods
Beznosikov, Aleksandr, Gorbunov, Eduard, Berard, Hugo, Loizou, Nicolas
Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. The success of the method led to several advanced extensions of the classical SGDA, including variants with arbitrary sampling, variance reduction, coordinate randomization, and distributed variants with compression, which were extensively studied in the literature, especially during the last few years. In this paper, we propose a unified convergence analysis that covers a large variety of stochastic gradient descent-ascent methods, which so far have required different intuitions, have different applications and have been developed separately in various communities. A key to our unified framework is a parametric assumption on the stochastic estimates. Via our general theoretical framework, we either recover the sharpest known rates for the known special cases or tighten them. Moreover, to illustrate the flexibility of our approach we develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA). Although variants of the new methods are known for solving minimization problems, they were never considered or analyzed for solving min-max problems and VIPs. We also demonstrate the most important properties of the new methods through extensive numerical experiments.
Embrace Complexity (Part 4)
Data, Cloud and AI are three fundamental forces that are fuelling an exciting (and at times scary) technological phase transition where we all move from the smog of the industrial age into the neon glow of the information age. If you have been following this series then you will know that we have been using the network shape like a north star that guides us as we search for a unified theory that connects these three forces together. A unified theory that can be used by all organisations so that fewer of them are left behind. The graph-shaped data introduced in part 2 provides us with networked information and networked information captures a richness and complexity that can only be found in the relationships that connect the parts together. The nuance and subtlety that exists in the lines connecting the dots.
Local SGD: Unified Theory and New Efficient Methods
We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local-SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Koloskova, Anastasia, Loizou, Nicolas, Boreiri, Sadra, Jaggi, Martin, Stich, Sebastian U.
Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).
A unified theory for the origin of grid cells through the lens of pattern formation
Sorscher, Ben, Mel, Gabriel, Ganguli, Surya, Ocko, Samuel
There are currently two seemingly unrelated frameworks for understanding these patterns. Mechanistic models account for hexagonal firing fields as the result of pattern-forming dynamics in a recurrent neural network with hand-tuned center-surround connectivity. Normative models specify a neural architecture, a learning rule, and a navigational task, and observe that grid-like firing fields emerge due to the constraints of solving this task. Here we provide an analytic theory that unifies the two perspectives by casting the learning dynamics of neural networks trained on navigational tasks as a pattern forming dynamical system. This theory provides insight into the optimal solutions of diverse formulations of the normative task, and shows that symmetries in the representation of space correctly predict the structure of learned firing fields in trained neural networks.
Physicist Stephen Hawking dies at 76
LONDON โ Stephen Hawking, Britain's most famous scientist, who dedicated his life to unlocking the secrets of the universe, has died at age 76. His children, Lucy, Robert and Tim, said in a statement carried by Britain's Press Association news agency on Wednesday: "We are deeply saddened that our beloved father passed away today. "He was a great scientist and an extraordinary man whose work and legacy will live on for many years." Born on Jan. 8, 1942 -- 300 years to the day after the death of the father of modern science, Galileo Galilei -- he believed science was his destiny. But fate also dealt Hawking a cruel hand. Crippled by amyotrophic lateral sclerosis (ALS), which attacks the nerves controlling voluntary movement, he spent most of his life in a wheelchair. Hawking defied predictions that he would only live for a few years, overcoming the debilitating effects of ALS on his mobility and speech that left him paralyzed and able to communicate only via a computer speech synthesiser. "I am quite often asked: how do you feel about having ALS?" he once wrote. "The answer is, not a lot.
Using Machine Learning To Study The String Landscape
Is fundamental physics unified into a single theory governing all known phenomena, or are we forced to accept a fractured state of affairs where different phenomena are addressed by different theories? This question has long been of first importance to theoretical physicists. Einstein, for example, spent many of his later years in search for a unified theory, with little success. Despite his brilliance, the deck was stacked against him, as certain aspects of fundamental physics such as the strong and weak nuclear forces were only just being discovered at the end of his life. Today we have a more complete picture of the interactions of elementary particles and also a strong sense of what is difficult in the search for a unified theory: combining general relativity, Einstein's theory of gravity, with quantum mechanics.